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I.  INTRODUCTION 
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This  paper  addresses  the  problem  in  which  a  team 
of  agents  is  searching  for  a  target  in  a  bounded  region 
with  the  objective  of  finding  the  target  in  minimum 
time.  Each  agent  is  equipped  with  a  gimbaied 
sensor  that  can  be  aimed  at  nearby  areas  within  the 
search  region.  As  they  move  around  and  take  sensor 
measurements,  the  agents  gather  information  on  the 
likely  locations  of  the  target.  The  objective  of  each 
agent  is  to  move  itself  and  control  its  sensor  in  a 
way  that  minimizes  the  expected  time  for  the  team 
to  discover  the  target.  In  this  paper,  we  present  an 
algorithm  to  approximately  solve  this  optimization 
problem.  The  algorithm  is  cooperative  in  that  the 
agents  share  information  and  jointly  optimize  their 
routes  for  the  benefit  of  the  team;  it  is  graph-based 
because  the  search  routes  are  defined  by  a  sequence 
of  vertices  in  a  dynamically  updated  graph;  and 
we  use  a  receding-horizon  optimization  over  a 
constantly  changing  target  probability  density  function 
(pdf)  similar  to  model  predictive  control.  Hence  we 
call  the  algorithm  cooperative  graph-based  model 
predictive  search  (CGBMPS).  We  also  describe  our 
implementation  of  this  algorithm  on  a  test  platform 
consisting  of  2  unmanned  air  vehicles  (UAV s)  with 
gimbal-mounted  cameras. 

We  begin  with  a  brief  review  of  search  theory. 
Modern  search  theory  was  pioneered  by  Koopman 
[14],  Stone  [23],  and  others,  and  was  initially 
motivated  by  the  desire  to  develop  efficient  search 
methods  to  find  enemy  marine  vessels.  More  recently, 
agencies  such  as  the  U.S.  Coast  Guard  have  applied 
search  theory  to  search  and  rescue  missions  with 
great  success,  measured  in  saved  lives  [6].  Other 
search  applications  include  exploration,  mining,  and 
surveillance  [9]. 

Early  search  theory  focused  on  the  allocation 
of  search  attention  to  limited  areas  within  a  large 
search  region,  which  is  appropriate  to  cases  in  which 
searcher  motion  is  unconstrained.  If  this  is  not  the 
case,  we  are  presented  with  the  more  difficult  problem 
of  finding  optimal  paths  for  the  searchers.  For  regular 
unimodal  probability  distributions  for  the  target 
position,  lawnmower-like  paths  are  optimal.  However, 
when  these  probabilities  result  from  sensor  fusion 
with  false  alarms  and  non-Gaussian  noise,  one  is 
often  faced  with  highly  irregular  distribution  for 
which  the  computation  of  optimal  paths  is  difficult. 
Mangel  formulated  the  continuous  search  problem  in 
[16]  as  an  optimal  control  problem  on  the  searcher’s 
velocity  subject  to  a  constraint  in  the  form  of  a  partial 
differential  equation,  but  this  problem  is  only  solvable 
for  very  simple  initial  target  probability  distributions. 
A  more  practical  approach  is  to  discretize  the  search 
space  and  formulate  the  search  problem  on  a  graph. 
Under  such  a  formulation,  the  set  of  search  paths  is 
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restricted  to  piecewise  linear  paths  connecting  a  finite 
set  of  graph  vertices.  Although  this  discretization 
simplifies  the  problem  to  some  extent,  it  is  still 
computationally  difficult.  Trummel  and  Weisinger 
showed  in  [26]  that  the  single-agent  search  problem 
on  a  graph  is  NP-hard  even  for  a  stationary  target.  In 
[5],  Eagle  and  Yee  were  able  to  solve  somewhat  larger 
problems  than  what  had  previously  been  possible 
by  formulating  the  problem  as  a  nonlinear  integer 
program  and  using  a  branch  and  bound  algorithm  to 
solve  it.  However,  the  size  of  computationally  feasible 
problems  was  still  severely  limited.  DasGupta  et  al. 
presented  an  approximate  solution  for  the  continuous 
stationary  target  search  based  on  an  aggregation  of 
the  search  space  using  a  graph  partition  [4].  In  [21], 
we  used  a  similar  approach,  but  with  a  “fractal” 
formulation,  which  allowed  the  partitioning  process  to 
be  implemented  on  multiple  levels.  All  of  the  results 
mentioned  so  far  are  for  a  single  search  agent. 

The  problem  gains  additional  complexity  when 
we  consider  multiple  agents  that  are  cooperatively 
searching  for  a  target.  There  are  several  methods 
for  reducing  this  complexity.  Some  researchers  have 
achieved  this  through  emergent  behavior,  which  is 
particularly  effective  when  communication  is  severely 
limited  [22,  8].  Chandler  and  Pachter  approached 
this  problem  by  splitting  the  agents  (UAVs)  into 
subteams  and  applying  hierarchical  decomposition 
to  the  tasks  [2],  Another  effective  way  to  reduce 
the  computation  involved  in  finding  optimal  search 
paths  to  locate  a  moving  target  is  to  restrict  the 
optimization  to  a  finite,  receding  horizon.  Somewhat 
surprisingly,  even  very  short  optimization  horizons 
can  yield  good  results.  Hespanha  et  al.  showed  in  [10] 
that  in  pursuit-evasion  games  played  on  a  discrete 
grid,  one  step  Nash  policies  result  in  finite-time 
evader  capture  with  probability  one.  Using  slightly 
longer  horizons,  Polycarpou  et  al.  introduced  a 
finite-horizon  look-ahead  approach  in  [19]  similar  to 
model  predictive  control  as  part  of  a  cooperative  map 
learning  framework.  Several  other  researchers  have 
considered  similar  look-ahead  policies  for  coordinated 
search  based  on  Bayesian  learning  models  [24,  7].  A 
challenge  faced  by  these  receding-horizon  algorithms 
is  that  because  they  require  the  solution  to  an  NP-hard 
search  problem  at  each  time  step,  the  prediction 
horizon  must  be  kept  short  to  ensure  computational 
feasibility.  This  means  that  if  there  are  regions 
of  high  target  probability  separated  by  a  distance 
that  is  much  greater  than  the  search  horizon,  the 
algorithm’s  performance  may  degrade  significantly. 
One  does  not  have  to  produce  pathological  examples 
to  generate  such  a  situation,  because  even  a  uniform 
initial  distribution  can  evolve  into  a  highly  irregular 
distribution  as  the  searchers  move  through  the  region 
and  update  their  target  pdf  estimates  based  on  sensor 
measurements. 


With  the  CGBMPS  algorithm,  we  address  this 
issue  by  performing  the  receding-horizon  optimization 
on  a  dynamically  changing  graph  whose  nodes  are 
carefully  placed  at  locations  in  the  search  region 
having  the  highest  target  probability.  The  prediction 
horizon  is  defined  as  a  fixed  number  of  graph 
edges,  but  the  edges  of  the  graph  are  not  uniform  in 
length.  The  algorithm  chooses  paths  that  maximize 
the  probability  of  finding  the  target  per  unit  time. 

This  dynamic  graph  structure  facilitates  two  key 
strengths  of  our  algorithm:  1)  search  agents  only 
perform  detailed  searches  in  regions  with  high  target 
probability,  and  2)  long  paths  can  be  efficiently 
evaluated  and  fairly  compared  with  short  paths. 

The  remainder  of  the  paper  is  organized  as 
follows.  Section  II  presents  the  cooperative  search 
problem  in  a  general  discrete  time  setting.  Section  III 
describes  our  approach  to  this  problem  in  the  form 
of  a  receding-horizon  search  algorithm  on  a  graph.  In 
Section  IV,  we  present  the  main  results  on  finite-time 
target  discovery.  Section  V  provides  the  numerical 
simulations  of  the  algorithm  and  discusses  the  results. 
Section  VI  describes  a  hardware  implementation  of 
the  algorithm  along  with  the  results  of  several  field 
tests  using  two  UAVs  with  gimbal-mounted  cameras 
in  a  hardware-in-the-loop  simulation  environment.  In 
Section  VII,  we  provide  conclusions  and  some  ideas 
for  future  research. 

II.  SEARCH  PROBLEM  FORMULATION 

This  section  formulates  the  search  problem  as 
a  discrete-time  optimization  in  which  a  team  of 
autonomous  agents  controls  their  positions  and  sensor 
orientations,  with  the  goal  of  finding  a  target  in 
minimum  time. 

A.  Agent  Dynamics  and  Sensor  Coverage 

Suppose  a  team  of  M  agents  is  searching  for  a 
mobile  target  in  a  bounded  planar  region  1Z  C  M2. 

Each  agent  is  equipped  with  a  gimbaled  sensor  that 
it  can  aim  at  a  limited  area  of  the  search  region.  The 
region  that  a  sensor  can  view  at  a  particular  time 
instant  is  called  the  field  of  view  (FOV),  and  the 
subset  of  1Z  viewable  by  the  sensor  as  it  is  swept 
through  its  entire  range  of  motion  (while  the  agent 
on  which  the  sensor  lies  is  stationary)  is  called  the 
sensor’s  field  of  regard  (FOR).  Fig.  1  shows  the 
relationship  between  agent,  FOV,  and  FOR. 

Fet  A  C  R3  be  the  space  in  which  the  agents 
move.1  We  denote  the  position  state  of  the  search 
team  by  p(k)  :  =  [pi(k),p2(k),...,pM(k)],  where  pjk)  <E 
A  is  agent  a’ s  position  at  time  k,  and  k  £  is  a 
discrete  time  variable  belonging  to  the  nonnegative 
integers.  We  assume  a  simple  kinematic  model  for 


1  The  agents  are  allowed  to  move  in  three  dimensions  for  generality 
of  the  approach,  and  since  the  sensor  properties  may  be  affected  by 
altitude. 
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Fig.  1 .  Diagram  of  sensor  FOR  and  FOV  for  search  agent 
located  at  position  p(k). 


the  agents,  which  can  move  in  any  direction  with 
unit  velocity.  For  each  agent,  we  define  a  sensor 
task  qa(k)  £  7Z  that  specifies  the  starepoint  where 
agent  a  will  point  its  sensor  at  time  k,  and  we  denote 
the  vector  of  sensor  tasks  for  the  team  by  q (k) :  = 
[ql(k),q2(k),...,qM(k)].  The  FOV  associated  with 
a  particular  sensor  task  will  generally  depend  not 
only  the  starepoint  qa(k),  but  also  on  the  agent’s 
position  pa(k)  and  the  characteristics  of  the  sensor. 

For  simplicity  of  presentation,  we  assume  that  there 
are  no  additional  degrees  of  freedom  associated  with 
the  sensor,  such  as  zooming,  that  could  change  the 
FOV  while  the  agent  position  and  sensor  task  remain 
fixed.  Denoting  the  set  of  all  possible  sensor  tasks  by 
Q,  and  the  set  of  all  subsets  of  TZ  by  V(7Z),  we  define 
a  function  FOV  :  Vl  x  Q  — >  V(7Z)  that  maps  an  agent’s 
position  and  the  point  at  which  its  sensor  is  aimed  to 
the  subset  of  7Z  in  view  of  that  sensor.  Similarly,  we 
use  the  function  FOR  :  A  — >  'P(R)  to  denote  the  subset 
of  the  search  region  lying  in  the  FOR  of  an  agent. 

B.  Sensor  Model  and  Target  Detection 

Suppose  an  agent  at  position  p(k)  aims  its  sensor 
at  the  point  q(k)  at  some  instant  in  time  k.  A  target 
located  at  a  point  x  £  FOV( p(k),q(k))  will  have  some 
probability  of  being  correctly  detected  by  the  sensor. 
We  generally  refer  to  this  value  as  the  probability 
of  detection,  and  denote  it  by  PD(p(k),q(k),x(k))  £ 
[pmin,  1],  where  pmm  >  0  denotes  a  minimum  value  for 
the  probability  of  detection.  Although  it  is  common 
to  assume  that  PD  is  a  constant,  we  allow  for  a  more 
a  general  case  to  account  for  changes  in  the  size  and 
shape  of  the  FOV  for  different  vehicle  altitudes  and 
sensor  orientations,  as  well  as  possible  line-of-sight 
obstructions  within  the  FOV.  We  refer  the  reader  to 
[15]  for  a  more  detailed  discussion  of  these  issues  in 
the  context  of  target  detection  with  a  video  sensor. 

The  probability  of  detecting  a  target  that  does  not  lie 


in  the  sensor’s  FOV  is  zero.  It  is  notationally  useful  to 
have  an  expression  for  the  probability  of  detection  for 
a  target  located  anywhere  in  the  search  region;  hence 
we  define  a  general  detection  function 

D(p(k),q(k),x(k)) 


=  f  PD(p(k),q(k),x(k)),  x(k)  £  FOV(p(k),qm 

[ 0,  otherwise 

(1) 

In  this  paper,  we  assume  that  the  sensors  do  not 
produce  false  positives;  i.e.,  the  sensors  do  not  detect 
the  presence  of  a  target  where  none  exists. 


C.  Target  State  Estimation 

Let  the  random  variable  X(k)  £  7 Z  denote  the  state 
of  the  target.  By  the  nature  of  the  search  problem, 
the  target  state  is  unknown,  and  the  objective  of 
the  agents  is  to  gather  information  on  X(k)  using 
their  sensors.  In  order  to  optimize  how  the  agents 
and  sensors  should  move,  we  need  a  method  for 
estimating  the  probability  distribution  of  X(k)  as  it 
evolves  over  time.  With  the  possible  exception  of 
some  very  simple  cases,  optimal  estimators  such 
as  the  Kalman  filter  do  not  apply  to  the  search 
problem  because  target  motion  may  be  nonlinear 
and  Gaussian  assumptions  on  the  probability 
distribution  do  not  hold.  Even  if  one  assumes  a 
Gaussian  initial  distribution,  sensor  measurement 
updates  will  cause  future  probability  distributions 
to  be  non-Gaussian.  Consequently,  target  estimation 
is  a  challenging  problem  and  a  rich  field  of  study 
in  itself.  We  refer  the  reader  to  [1]  and  [11]  for  a 
deeper  analysis  of  the  problem  and  several  good 
examples  of  target  estimators.  For  the  purposes  of 
this  paper,  rather  than  choosing  a  specific  approach, 
we  provide  an  algorithm  that  applies  to  a  wide  range 
of  estimation  methods.  In  particular,  we  work  with 
the  class  of  estimators  that  characterize  the  target 
pdf  with  a  discrete  approximation  to  an  underlying 
continuous  pdf.  We  choose  this  class  of  estimators, 
which  includes  grid-based  probabilistic  maps  as 
well  as  particle  filters,  because  of  their  ability  to 
accurately  incorporate  nonlinear  target  dynamics  and 
environmental  constraints  such  as  road  networks,  as 
well  as  accurately  model  non-Gaussian  distributions. 

In  the  remainder  of  this  section,  we  assume  that 
the  pdf  of  the  target  state  X(k)  is  approximated  by  a 
function  fk(-)  of  the  following  form: 


fk(x)  =  ^2  wi(k)S(Xj(k)  -  x),  Vk  £  Z>0,  Vx  £  1Z 
!  =  1 

(2) 

where  each  x;(k)  £  7 Z,  each  w((k)  £  [0, 1],  and  b(-) 
denotes  a  function  that  integrates  to  one  over  the 
search  region; 


in 


6(Xj(k)  —  x)dx  =  1, 


Vi  € 


(3) 
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For  example,  <5(  )  could  take  the  form  of  a  Dirac  delta 
function,  for  use  with  a  point-mass  approximation  to 
a  probability  distribution,  or  it  could  take  the  form 
of  a  two-dimensional  rectangular  impulse  function, 
for  the  case  of  grid-based  approximation.  The  pdf 
approximation  in  (2)  is  a  type  of  probability  mixture 
model  [17],  which  also  has  the  property  that  the  sum 
of  all  weights  wt(k)  is  equal  to  one: 

n 

^w,(k)=  1.  (4) 

i=  1 

We  define  the  target  probability  mixture  model 
(TPMM)  as  the  pair  [x(k),w(k)],  where  x(k)  =  \xt  (k), 
x2(k),...,xn(k)]  and  w (k)  =  [wl(k),w2(k),...,wn(k)]. 
Where  there  is  no  ambiguity,  we  may  also  refer  to  the 
TPMM  as  the  target  pdf  in  the  remainder  of  the  paper. 

The  initial  state  of  the  TPMM,  [x(0),w(0)]  is 
based  on  any  prior  knowledge  of  target  location, 
e.g.  an  initial  probability  distribution,  or  a  uniform 
distribution  if  no  prior  target  information  is  available. 
Future  TPMM  states  depend  on  sensor  tasks  q (k)  and 
the  prior  state.  The  dynamics  of  x(k)  and  w(k)  will 
depend  on  the  estimator  used,  but  can  be  expressed  in 
general  form  as 

x(k  +  1 )  =  fx(x(k),xv(k),  q(k)')  $ 

w  (k  +  1 )  =  fw(x(k),  w(k),  q(k)) 

where  the  functions  fx  and  fw  must  preserve  the 
requirements  xt(k  +  1)  €  7 Z  and  w;(k  +  1)  e  [0, 1] 
for  each  i,  as  well  as  condition  (4).  In  (5),  we  have 
defined  a  general  model  that  is  compatible  with 
many  of  the  typical  target  models  used  in  the  search 
literature,  including  Markov  models  [7]  and  particle 
filters  [25].  In  the  Appendix,  we  provide  example 
dynamics  of  (5)  for  a  simple  particle  filter  and  a 
probabilistic  map. 

D.  Search  Reward 

The  objective  in  the  search  problem  is  to  find  the 
target  in  minimum  time.  To  achieve  this,  the  agents 
must  be  able  to  evaluate  the  probability  of  finding  the 
target  for  various  sequences  of  control  actions.  In  this 
section,  we  define  a  criterion  called  search  reward  to 
measure  this  probability. 

1)  Single  Agent  Target  Detection'.  Recall  from 
Section  IIB  that  when  an  agent  at  position  p  aims 
its  sensor  at  the  point  q ,  the  probability  that  a  target 
located  at  the  point  x  will  be  detected  is  given  by 
D(p,q,x),  which  was  defined  as 

n,  ,  (PD(p,q,x),  x&FOV(p,q) 

0 .  otherwise 

for  a  given  instant  in  time  (we  omit  the  time  variable 
k  for  simplicity  of  notation).  Since  the  location  of 
the  target  is  unknown  in  the  search  problem,  we  now 


define  the  probability  of  detection  conditioned  on  our 
best  estimate  of  the  target  position,  [x,w]  as  discussed 
in  Section  IIC.  First,  we  note  that  the  portion  of  the 
ith  TPMM  component  that  lies  within  the  sensor’s 
FOV  can  be  expressed  as  follows: 

/  d(xi  x)D( p,  q, x)dx. 

Jk 

Let  V  be  the  event  that  the  single  agent  discovers 
the  target.  One  can  compute  the  probability  of  V 
by  applying  the  total  probability  theorem  over  the  n 
components  of  the  TPMM: 

n 

PCD)  =  V'  Wj  /  6(X/  x)D(p,q,x)dx.  (6) 

;=i  Jn 

2)  Multi-Agent  Target  Detection'.  Consider  now 
the  case  in  which  M  agents  are  searching  for  the 
same  target,  cooperatively.  That  is,  they  share  a 
common  goal  of  finding  the  target  in  minimum  time, 
regardless  of  which  agent  is  the  first  to  discover  it. 

The  agents  cooperate  by  sharing  a  common  target 
pdf  [x,w],  which  each  agent  updates  as  it  gathers 
new  information  from  its  sensors.  In  this  section,  we 
are  interested  in  calculating  the  team  probability  of 
detection,  denoted  by  P(27team),  where  Dteam  is  the 
event  that  the  target  is  discovered  by  at  least  one  of 
the  agents  at  a  given  instant  in  time. 

Suppose  M  agents  at  positions  (pl,...,pM)  aim 
their  sensors  at  the  points  (ql,...,qM).  The  FOVs 
associated  with  each  agent’s  sensor  task  may  or 
may  not  overlap  in  the  search  region.  The  following 
sequence  of  joint  events  enumerates  all  situations  that 
lead  to  2?team,  keeping  in  mind  that  once  one  agent 
discovers  the  target,  it  is  irrelevant  to  our  problem 
whether  another  agent  simultaneously  discovers  the 
same  target: 

D ,  =  the  target  is  discovered  by  agent  1; 

V2  =  the  target  is  not  discovered  by  agent  1,  but  it 
is  discovered  by  agent  2; 

V3  =  the  target  is  not  discovered  by  agents  1  or  2, 
but  it  is  discovered  by  agent  3; 

VM  =  the  target  is  not  discovered  by  agents  1 
through  M  —  1,  but  it  is  discovered  by  agent  M. 

Since  these  events  are  mutually  exclusive,  the  event 
that  the  target  is  detected  by  at  least  one  agent  is 
simply  the  union  of  the  events  listed  above: 

M 

■^team  =  U  ^a' 

a=  1 

The  key  to  evaluating  the  probability  of  each  event  Va 
is  to  notice  that,  while  the  individual  events  are  not 
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independent,2  they  become  independent  when  each  Va 
is  conditioned  on  a  value  of  the  target  state.  Using  this 
property,  it  is  straightforward  to  show  from  (6)  that 


p(Pa)  =  X>/k  ^6(xi-x)D(pa,qa,x) 


a—  1 


JJ(1  -  D(pa,qa,x))j  dx  (7) 


for  each  a6{l,...,M}.  Since  the  sensor  tasks  {qa} 
happen  simultaneously,  the  order  in  which  we 
consider  the  target  detected  by  the  various  tasks 
is  irrelevant  and  the  arbitrary  ordering  of  agents 
incurs  no  loss  of  generality.  The  team’s  probability 
of  detection  may  be  calculated  by  summing  the 
probabilities  of  detection  for  each  agent: 


p<pt  eam)  =  Y,P(V«  i  x-w’P’ti) 

a=  1 

=  Wi  [  ( 6(X‘  ~  X)D(Pa^a’Xd 


a=  1  l  i=  1 


II(I  -D(pcx,qa,Xi))\dx\ . 


(8) 

3)  Cumulative  Reward:  Thus  far  we  have 
considered  instantaneous  probabilities  of  detection, 
but  the  quantity  we  wish  to  minimize  is  the  time  until 
the  target  is  detected.  This  requires  formulating  a 
search  reward  function  r(k)  in  terms  of  a  cumulative 
probability  of  detection,  as  follows. 

Let  T*  denote  a  hypothetical  time  at  which  the 
target  is  found  by  any  of  the  search  agents.  We  define 
the  search  reward  as  the  probability  that  the  target  is 
found  at  time  k: 


We  can  now  rewrite  the  search  reward  as 


r(k)  :=  P(T*  =k\T*>k) 


k- 1 

X/(«) 

K  =  0 


(11) 


The  term  P(T*  =  k  \  T*  >  k)  is  the  conditional 
probability  that  the  target  will  be  found  at  time  k 
given  that  it  was  not  found  previously.  Since  [x,w] 
is  conditioned  on  the  target  not  yet  being  found, 


P{T*  =k\T*>k)  =  P(V^Jk)).  (12) 


Combining  (8),  (11),  and  (12)  results  in  the  following 
expression  for  the  search  reward: 


M  (  n  ..  / 

r(k)  =  ^  <j  Yw,(k)  /  (6{xi-x)D(pa(k),qa(k),xi{k)) 

a=  1  l  ;=i  \ 

x  Il(1  “  (k)))  j  dx\ 

*  ('i-y^oo]  •  (i3) 


It  is  also  useful  to  have  an  expression  for  the  reward 
collected  by  each  individual  agent.  Therefore  we 
define  ra(k)  as  the  search  reward  for  agent  a  at  time 
k.  We  can  then  rewrite  (13)  as 


where 


M 

r(k)  =  5>0(fc), 

a=  1 


i=  1 


^5(x,.  -  x)D(pll(k),qa(k),xi{k)) 


x  II(  1  “  Wpjkl.qjk)^^))) 


dx 


r(k)  :  =  P(T*  =  k). 
We  can  express  the  search  reward  as 


x 


(14) 


r(0) :  =  0,  (9) 

r(k)  :=  P(T*  =  k)  =  P(T*  =k\T*>  k)P(T*  >  k ). 

(10) 

Since  the  events  of  finding  the  target  at  different  times 
are  mutually  exclusive,  we  can  express  P(T*  >  k),  the 
probability  that  the  target  has  not  been  found  before 
time  k,  as 

k—  1  k—  1 

P(T*  >  k)  =  1  -  J2p(T*  =  «)  =  1  - 

K= 0  K=0 


2  Individual  events  Va  are  not  independent  because  knowledge  about 
one  event  occurring  (or  not  occurring)  gives  us  clues  about  the 
value  of  the  target  state,  which  affects  the  outcome  of  the  other 
events. 


E.  Problem  Statement 

The  goal  of  the  search  problem  is  to  optimally 
control  the  agents  and  their  sensors  to  locate  the 
target  in  minimum  time.  We  define  a  search  policy 
to  be  a  function  //  that  maps  the  current  team  state 
p(k)  and  the  TPMM  \x(k),\\'(k)}  to  the  next  set  of 
controls  to  be  executed  by  the  team.  That  is,  each 
agent’ s  next  position  and  sensor  task  is  given  by 
[p(k  +  l),q(k  +  1)]  =  ii(p(k),x(k),w(k)).  The  objective 
in  the  search  problem  is  to  compute  the  search  policy 
that  results  in  the  fastest  possible  discovery  of  the 
target  by  the  team. 

T*  denotes  the  time  at  which  the  target  is 
discovered  by  any  agent.  This  time  is  unknown  at  the 
beginning  of  the  search,  but  we  can  write  its  expected 
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value  as 


E,[T*]  =  Y,kr(k). 

k= 0 

We  can  now  express  the  objective  of  the  search 
problem  as 

min£  in  (15) 

subject  to 

Pa(k+l)eVa(k)  (16) 

qa(k  +  1)  e  Qa(pa(k  +  1))  (17) 

where  the  set  Va(k)  c  A  denotes  the  set  of  all 
reachable  points  and  Qa(pa(k  +  1))  c  TZ  the  set  of  all 
feasible  sensor  tasks  for  agent  a  at  position  pa(k  +1). 
These  constraints  are  imposed  by  physical  limitations 
of  the  agents  and  their  gimbaled  sensors. 

For  agents  with  such  sensor  and  motion 
constraints,  the  problem  posed  in  (15)  is  a  difficult 
combinatorial  optimization  problem.  Even  for  the  case 
of  a  single  agent,  stationary  target,  and  fixed  sensor, 
this  search  problem  is  known  to  be  NP-hard  [26].  The 
mobile  target  and  movable  sensor  introduce  additional 
complexity  to  the  problem. 

III.  COOPERATIVE  GRAPH-BASED  MODEL 
PREDICTIVE  SEARCH  ALGORITHM 

In  this  section,  we  present  a  receding-horizon 
graph-based  search  algorithm  that  approximately 
solves  the  search  problem  (15) — (17).  This  algorithm 
plans  paths  and  sensor  tasks  for  a  team  of  cooperating 
search  agents  based  on  predicted  future  target  states. 
Agent  paths  are  formulated  on  a  Euclidian  graph  as 
a  sequence  of  graph  edges.  The  prediction  horizon 
is  set  to  a  fixed  number  of  graph  edges,  rather  than 
a  fixed  time,  so  that  paths  of  varying  duration  may 
be  considered.  This  allows  agents  to  travel  between 
regions  of  high  reward  that  are  separated  by  large 
areas  with  low  reward  using  just  one  path  step. 

This  approach  is  designed  to  reduce  computation 
by  optimizing  over  a  smaller  number  of  graph  edges 
and  therefore  on  a  much  smaller  state  space,  while 
also  using  careful  vertex  placement  to  maintain  route 
flexibility  in  regions  of  high  reward.  Although  our 
algorithm  does  not  necessarily  result  in  a  policy  that 
minimizes  E  [  T  ] ,  the  resulting  policy  does  lead  to 
a  finite  value  for  this  expected  time  and  we  provide 
an  upper  bound  for  this  quantity  in  Section  IV. 
Furthermore,  Section  V  provides  simulations  showing 
that  the  CGBMPS  algorithm  outperforms  previously 
proposed  search  algorithms. 

A.  Graph  Search  Formulation 

Let  G  :  =  (V,E)  be  a  Euclidian  graph  for 
path-planning  with  vertex  set  V  and  edge  set  E.  The 
vertices  of  the  graph  will  serve  as  waypoints  for 
the  agents,  which  are  connected  by  graph  edges  to 


form  paths.  When  there  is  no  ambiguity,  we  use  the 
notation  v  to  denote  both  the  vertex  in  the  set  V  and 
its  location  in  A,  the  space  in  which  the  agents  move. 
Similarly,  e  =  (v,v')  will  denote  both  the  edge  in  the 
set  E  and  the  line  segment  connecting  its  endpoint 
vertices  v  and  v' . 

A  path  in  G  is  a  sequence  of  vertices  P  :  = 
(vl,v2,...,ve)  such  that  (v;,vi+])  €  E.  Each  path 
segment  (v(-,vi+1)  will  be  considered  as  one  step  in 
the  prediction  horizon.  The  path  cost  in  G  is  the  time 
duration  of  the  path  and  is  defined  as 

r-i 

C(F):=5>(v,,v1+1)  (18) 

i=  1 

where  c(v,,vi+1)  is  an  edge-cost  function  c  :  E  — >  [0,oo) 
that  represents  the  transit  time  between  vertices, 
assuming  the  agents  move  with  unit  velocity.3 

Starting  from  an  agent’s  current  location,  the 
algorithm  selects  the  agent’s  future  route  as  an  (-step 
path  in  G  that  maximizes  a  path  reward  function. 

In  an  attempt  to  minimize  the  expected  time  £[7’*] 
to  find  the  target,  we  define  a  path  reward  function 
corresponding  to  the  team’s  probability  of  finding 
the  target  per  unit  time  [see  criterion  (20)  below]. 

This  optimization  criterion  enables  the  algorithm  to 
compare  paths  of  various  lengths. 

As  each  agent  travels  along  its  path  P,  it  executes 
a  sequence  of  sensor  tasks,  which  we  call  a  sensor 
schedule  and  denote  by  S(P,k )  :=  (q(k).q(k  +  1), 

...,q(k  +  Ip)),  where  each  q( •)  is  a  scheduled  sensor 
task  belonging  to  Q(k),  the  set  of  all  feasible  sensor 
tasks  for  the  agent  at  time  k ,  and  Tp  C(P)\  is  the 
total  number  of  sensor  tasks  that  may  be  executed 
along  the  path  P .  In  this  discrete  time  setting,  we 
assume  that  the  duration  of  each  sensor  task  is 
one  unit  of  time.  Fig.  2  shows  a  diagram  of  an 
agent  executing  a  sensor  schedule  along  its  path 
P .  The  sensor  tasks  q(k)  may  be  chosen  arbitrarily 
or  computed  by  an  optimization.  This  allows  for 
several  possible  methods  of  algorithm  implementation, 
including 

1)  Fixed  Sensor  Tasking'.  Construct  S(P,k ) 
assuming  a  fixed  pattern  of  sensor  motion,  such  as 
slewing  side  to  side  within  the  FOR. 

2)  Joint  Routing  and  Sensor  Optimization'. 

Construct  S(P,k )  by  optimizing  over  all  possible 
sensor  schedules  along  the  path  P. 

Fixed  sensor  tasking  requires  much  less  computation, 
but  when  the  speed  of  the  agents  is  too  fast  for  the 
sensors  to  view  everything  inside  the  agents’  FORs, 
or  urban  scenarios  with  line-of-sight  blockages, 
method  2  may  yield  significant  benefit  over  method  1 . 


3UAVs  in  the  presence  of  wind  may  move  with  unit  air  velocity, 
but  have  different  ground  velocities.  This  can  be  easily  incorporated 
into  the  algorithm  so  that  predicted  paths  use  UAV  ground  velocity 
(accounting  for  wind).  In  this  case  upwind  paths  have  higher  costs. 
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Fig.  2.  Example  sensor  schedule  for  agent  traveling  along 
path  P. 


Whichever  method  is  used,  the  sensor  tasking 
algorithm  must  contain  a  model  of  the  sensor  gimbal 
so  that  it  will  only  generate  candidate  sensor  tasks 
within  the  gimbal  range  limits. 

The  computation  of  the  path  reward  depends  on 
which  method  of  sensor  tasking  is  used.  Suppose 
that  the  search  agents  1}  have  planned 

paths  {/■}?“/  and  corresponding  sensor  schedules 
{S(Pi,k)}f~) .  For  fixed  sensor  tasking,  we  define  the 
path  reward  for  agent  a  traveling  along  candidate  path 
Pa  as 

k+TPa 

(19) 

K=k 

where  ra(n)  is  computed  using  (14)  with  a  sensor 
schedule  S(Pa,k )  that  is  computed  by  some  fixed 
algorithm.  Note  that  any  paths  { P,  \fLa+ 1  f°r  the  agents 
{a  +  1  ,...,M}  do  not  affect  (19).  The  path  reward 
for  joint  routing  and  sensor  optimization  includes  an 
optimization  over  all  sensor  schedules,  and  can  be 
expressed  as  follows: 

k  +  Tpa 

Ra(pi  •  ■  •  •  >Pa)  ■  =  max  rjn).  (20) 

K=K 


To  provide  a  fair  measure  of  reward  over  paths  of 
various  lengths,  we  define  the  normalized  path  reward 
Ra  for  the  path  Pu  as  the  path  reward  divided  by  the 
path  cost: 

(21) 

Since  the  time-step  is  defined  as  the  duration  of  one 
sensor  task,  the  reweighting  in  (21)  is  equivalent  to 
dividing  by  the  number  of  sensor  tasks  in  the  plan. 
One  can  think  of  (21)  as  a  reward-per-sensor-task 
function,  and  optimizing  this  function  should  yield 
collections  of  tasks  that  give  the  highest  reward  per 
task. 

As  mentioned  previously,  each  agent  computes  a 
new  path  whenever  it  reaches  a  waypoint.  Since  the 
graph  edges  have  nonuniform  length,  the  agents  will 
generally  arrive  at  waypoints,  and  hence  compute 
paths,  at  different  times.  At  each  time  k,  we  therefore 
define  two  groups  of  agents:  the  set  Aplan(k)  of  agents 
that  have  arrived  at  waypoints  and  need  to  plan  new 
paths,  and  the  set  Aen(k)  of  the  remaining  agents  that 
are  en  route  to  waypoints.  In  the  CGBMPS  algorithm, 
whenever  the  set  A  lan(&)  is  nonempty,  the  agents  in 
that  set  compute  new  paths  assuming  that  the  agents 
in  the  set  Aen(k)  will  continue  on  their  current  paths. 
With  this  formulation,  we  can  express  each  step  in  the 
receding-horizon  optimization  as 

M 

{P* :  a  e  Aplan(A-)}  :  =  arg max  ^R^P i?). 

{/>:asAplan(*:)}  ;=1 

(22) 

Often  the  set  Aplan(k)  consists  of  a  single  agent, 
reducing  (22)  to  an  optimization  for  the  path  of 
that  agent.  If  this  is  not  the  case,  a  computationally 
practical  approximation  to  (22)  consists  of  a 
sequential  optimization  by  the  planning  agents,  which 
we  can  express  as 

P* :  =  arg  max  fl„(P*,. .  .,P*_i,Pa)  Vo  e  A  lan(k) 

Pc, 

(23) 

where  each  P*  is  either  the  current  path  of  an  en  route 
agent  i  €  Aen(k)  that  was  computed  at  a  previous  time 
or  a  path  computed  using  (23)  by  an  agent  earlier  in 
the  sequence  Aplan(k).  For  simplicity  of  notation  and 
without  loss  of  generality,  we  assume  in  (23)  that  the 
agents  {1  are  reindexed  (if  necessary)  so  that 

the  agents  in  Aen(k)  have  lower  indices  than  those  in 

^plan(^)- 

Notice  that  since  each  agent  evaluates  reward  over 
a  different  time  period  TP  some  agents  may  plan 
paths  further  into  the  future  than  others.  This  does 
not  pose  a  problem  for  the  algorithm,  since  (21)  is 
evaluated  by  accounting  for  only  the  reward  that  prior 
agents  have  planned  to  collect  up  until  their  respective 
time  horizons.  However,  there  is  the  potential  for  an 
agent  A  to  “steal”  reward  from  another  agent  B  if 
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(C)  (d) 

Fig.  3.  Step-by-step  example  of  graph  construction  process.  Black  points  represent  states  of  particle  filter  approximation  to  target  pdf. 
(a)  Place  hexagonal  lattice  over  search  region,  (b)  Apply  threshold  to  edges,  (c)  Connect  using  Delaunay  triangulation. 

(d)  Combine  graphs  b  and  c  for  final  graph. 


agent  A’ s  path  covers  a  longer  time  period  and  cuts 
in  front  of  where  agent  B  would  be  expected  to  go 
had  it  planned  a  longer  path.  In  this  case,  at  its  next 
waypoint,  agent  B  must  simply  plan  a  course  that 
takes  agent  A’s  impinging  path  into  account. 

B.  Dynamic  Graph  Generation 

Although  the  CGBMPS  algorithm  will  work  with 
any  graph,  the  structure  of  the  graph  has  a  significant 
impact  on  computation,  and  the  vertex  and  edge 
placements  will  affect  the  algorithm’s  performance. 
Also,  since  the  target  probability  distribution  is 
constantly  changing  due  to  sensor  measurements 
and  predicted  target  motion,  the  graph  should  be 
periodically  updated  to  keep  the  agents  searching 
only  high-reward  areas.  This  graph  update  should 
occur  at  each  time  k  at  which  the  set  of  agents 
Apian(k)  is  nonempty.  We  now  present  a  dynamic 
graph  construction  process  that  guarantees  desirable 
performance  properties  of  the  CGBMPS  algorithm. 
Fig.  3  shows  an  example  of  this  process. 


In  the  following  procedure,  we  use  the  function 
VF(x,w,v)  to  denote  the  sum  of  the  weights  of  all 
components  of  the  TPMM  lying  inside  the  FOR  of 
a  point  v  €  A ,  that  is 

n  r. 

W(x,  w,  v) :  =  V'  vv;  /  h(xt  —  x )  •  inFOR(v,x)i/x 
1=1  Jn 

where  inFOR(v,x)  €  {0, 1}  indicates  whether  or  not 
the  point  x  is  inside  the  FOR  of  the  point  v,  and  is 
expressed  by 

fl,  xgFOR(v) 

inFOR(v,x)  :=  <  .  (24) 

( 0,  otherwise 

Graph  Construction  Process'. 

1)  Construct  a  uniform  lattice  graph  G  :=  ( V,E ) 
over  the  search  region  1Z  having  the  property  that  for 
every  point  r  in  R,  there  exists  a  vertex  v  £  V  such 
that  the  point  r  falls  within  the  FOR  of  v.  This  can 
always  be  achieved  by  choosing  a  sufficiently  small 
vertex  spacing  in  the  lattice  [cf.  Fig.  3(a)]. 
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2)  Choose  a  reward  threshold  r  >  0  and  let  Gk  :  = 

(Vk,Ek)  be  the  subgraph  of  G  induced  by  the  vertex 
set  VkT  :  =  {v  €  V  :  W(x(fc),w(U),v)  >  r}  U  V1,  the  set 
of  vertices  for  which  the  reward  is  no  less  than  r 
combined  with  the  set  V 1  :=  of  vertices 

that  serve  as  initial  waypoints  for  the  agents.  The  edge 
set  El  consists  of  all  edges  in  E  that  connect  pairs  of 
vertices  that  both  belong  to  the  set  VkT  [cf.  Fig.  3(b)], 

3)  Let  G[)d  :=  (Vk,Ek>d)  be  the  graph  generated 
by  a  Delaunay  triangulation  of  Vk  [18].  Next,  let 
Gfel<rf  :=  (Vk,E'^'<d)  be  the  subgraph  of  Gfel 
obtained  by  keeping  only  the  edges  connecting 
vertices  of  degree  less  than  d ,  where  d  is  the  degree  of 
the  lattice  graph  G.  That  is,  Ekel<d  :  =  {(v,  v')  €  Elk)el  : 
deg(v)  <  d  and  deg(v')  <  d}  [cf.  Fig.  3(c)]. 

4)  The  final  graph  Gk  :=  (Vk,Ek)  is  the 
combination  of  graphs  Gk  and  Gfel<d,  with  vertex  set 
Vk  :=  Vk  and  edge  set  Ek  :=  El  U£fel<J  [cf.  Fig.  3(d)]. 
In  step  1 ,  any  type  of  lattice  graph  will  suffice,  but 
we  have  found  that  a  degree-3  hexagonal  lattice  is  a 
particularly  good  choice.  The  threshold  r  in  step  2 
should  be  carefully  chosen  so  that  it  includes  only 
high-reward  edges  but  does  not  exclude  so  many 
edges  that  the  number  of  feasible  paths  is  severely 
restricted.  The  purpose  of  step  3  is  to  ensure  that  the 
graph  Gk  is  connected  and  provides  paths  between 
unconnected  areas  of  high  reward.  We  include  only 
the  Delaunay  edges  connecting  vertices  of  degree  less 
than  3  because  these  are  the  edges  between  boundary 
vertices  of  the  disconnected  components  in  the  graph. 
The  remaining  Delaunay  edges  are  unnecessary.  Step 
4  combines  the  graphs  of  steps  2  and  3  to  produce  the 
final  graph. 

An  additional  consideration  in  the  graph 
construction  is  that  to  ensure  achievable  flight  paths, 
the  initial  graph  vertex  spacing  must  be  chosen  coarse 
enough  that  an  agent  can  move  from  any  vertex  to  any 
other  vertex  without  looping.  This  is  easy  to  compute 
from  the  minimum  turn  radius  of  the  agents,  and  in 
our  experience  with  UAVs,  results  in  a  vertex  spacing 
that  still  satisfies  the  requirement  of  step  1. 

C.  Non-Zero-Reward  Paths 

In  this  section,  we  show  that  for  graphs 
constructed  by  the  above  process,  there  always  exist 
search  paths  starting  from  any  vertex  on  which  the 
agents  can  collect  positive  reward,  provided  the 
following  assumption  holds. 

Assumption  1  There  exists  a  constant  7  >  0  such 
that  for  every  point  v  €  A  and  for  every  m  €  N,  the 
evolution  of  [x(L),  w(k)]  governed  by  (5)  satisfies  the 
property 

W(x(k  +  m),w(k  +  m),v)  >  "fmW(x(k),w(k),v). 

This  assumption  is  a  requirement  on  the  dynamics 
of  [x(A'),w(L)];  it  essentially  states  that  the  total  target 


weight  within  the  FOR  of  a  vertex  cannot  drop  below 
a  threshold  value  in  a  finite  number  of  time  steps. 

In  a  grid-based  probabilistic  map,  for  example,  this 
assumption  is  satisfied  provided  that  there  is  at  least 
one  non-zero  entry  in  the  columns  of  the  transition 
probability  matrix  for  the  cells  inside  the  FOR  of  v. 
The  assumption  will  hold  in  a  particle  filter  provided 
a  large  enough  number  of  particles  and  also  enough 
randomization  in  the  model  to  make  sure  that  while 
many  particles  may  leave  a  region,  others  may  enter 
so  that  the  total  weight  does  not  decay  to  zero.  Under 
this  assumption  on  the  TPMM,  the  following  lemma 
provides  a  lower  bound  on  the  conditional  probability 
that  the  target  will  be  found  at  some  time  in  the 
interval  [k  +  1  ,k  +  T]  given  that  it  was  not  found  at 
or  before  time  k.  This  result  will  be  instrumental  in 
proving  that  the  CGBMPS  algorithm  is  able  to  find 
the  target  with  probability  one. 

LEMMA  1  There  exists  a  constant  e  >  0  and  a  time 
T  >  0  such  that  for  every  graph  Gk  generated  by  the 
graph  construction  process  and  every  collection  of 
l- step  paths  {Pj  in  Gk,  there  exists  a  sequence 

of  sensor  tasks  {q(£  +  j)}jfo  for  which 

k+T-l 

P^team(K)  I  x(K)>W(«),P(«),q(«))  >  £. 

K=k 

Proof  Let  v,  designate  the  first  vertex  in  path  Px, 
and  choose  a  vertex  v2  for  which  there  exists  an  edge 
e  £  Ek  connecting  vx  and  v2.  This  is  always  possible 
because  the  Delaunay  triangulation  guarantees  that  Gk 
is  connected.  The  graph  construction  process  ensures 
that  W(x(k),w(k),v2)  >  r,  where  r  is  the  reward 
threshold  selected  in  step  2  of  the  graph  construction 
process.  Let  Pmax  be  the  maximum  Euclidean  distance 
between  any  two  vertices  in  Gk.  Using  the  fact 
that  agents  travel  with  unit  velocity,  denote  the 
maximum  number  of  time  steps  between  two  vertices 
by  T  :=  |_PmaxJ .  By  Assumption  1,  we  have  that 
W(x(k  +  T),w(k  +  T),v2)  >  yTW(x(k),yv(k),v2),  and 
thus  W(x(k),w(k),v2)  >  7?V.  We  now  choose  a  sensor 
schedule  S(Pa,k )  that  includes  the  sensor  task 

n 

q(k  +  m)  :=  arg  max  wfk  +  m) 

qeQn,pee  j=l 

x  /  6(Xj(k  +  m)  —  x)D(p,q,x))dx 
Jn 

for  some  m  <  T,  where  Qe  is  the  set  of  all  feasible 
sensor  tasks  for  an  agent  along  e.  Recall  from 
Section  IIB  that  the  minimum  value  for  the  probability 
of  detection  is  denoted  by  pmin.  The  minimum  amount 
of  target  weight  contained  within  the  sensor  task 
q(k  +  in)  is  achieved  for  a  uniform  target  weight 
distribution,  and  is  given  by  7rTpmin(AFOV/AFOR), 
where  AF0V  is  the  minimum  area  of  the  FOV 
and  AFOR  is  the  maximum  area  of  the  FOR  for 
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TABLE  I 

Cooperative  Graph-Based  Model  Predictive  Search  Algorithm 


ALGORITHM  1  CGBMPS 
1:  k:=  0 

2:  Construct  G0  =  (I^,-E0)  as  in  Section  IIIB 

3:  For  each  agent  ae  {1  assign  an  arbitrary  initial  path  P  =  (v“,v")  on  G0  and  a  corresponding  sensor  schedule  S(P, 0) 

4:  while  target  has  not  been  discovered  do 

5:  Each  agent  a  points  its  sensor  at  qa(k)  and  takes  measurements 

6:  Compute  the  set  of  agents  arriving  at  waypoints  A^l!in(k) 

7:  if  the  set  A^n(k)  is  nonempty  then 

8:  Update  Gk  as  in  Section  IIIB 

9:  for  each  agent  a  in  the  set  Aplan(I:)  do 

10:  Compute  the  path  P*  starting  from  and  sensor  schedule  S(P*,k)  using  (22) 

1 1 :  end  for 

12:  end  if 

13:  Agents  move  toward  next  waypoint  in  path 

14:  Evaluate  x(k  +  1),  w (k  +  1),  and  r(k  +  1)  using  (5)  and  (13) 

15:  k:=k+l 
16:  end  while 
17:  T*  :=  k—l 


any  agent/sensor  configuration.  We  conclude 
from  (14)  and  (20)  that  E?=i  > 

7rTPmin(AFOv/AFOR)(1-ELor(«))-  Therefore, 


Lemma  1  holds  with 


e  =  7  rpn 


iiG^fovMfor)  (  1  -  ^Tr(K) 


K= 0 


D.  CGBMPS  Algorithm 

The  CGBMPS  algorithm  is  described  in  Table  I.  It 
begins  with  the  initialization  of  the  graph  and  each 
agent’s  path  and  sensor  schedule  in  steps  2  and  3. 

At  each  subsequent  time  step  k,  whenever  the  set  of 
agents  Aplan(7c)  is  nonempty,  the  graph  is  updated, 
and  the  agents  in  Ap\.dn(k)  select  new  paths  and  sensor 
schedules  according  to  (22).  This  process  repeats  until 
the  target  is  found  at  time  T*,  which  is  proven  to  be 
finite  with  probability  one  in  Theorem  1 . 

Whenever  a  path  is  computed,  it  starts  from  the 
waypoint  after  the  one  that  was  reached.  There  are 
two  key  reasons  for  implementing  the  algorithm  in 
this  way.  First,  this  allows  time  for  computation  of 
the  next  path,  avoiding  situations  where  an  agent 
has  reached  a  waypoint  and  does  not  have  any 
destination  until  it  completes  a  computation.  Second, 
it  is  advantageous  to  have  one  waypoint  planned  in 
advance  so  that  sharp  turns  in  the  upcoming  path  may 
be  smoothed. 


E.  Computational  Complexity 

The  CGBMPS  algorithm  provides  a 
computationally  efficient  method  for  approximating 
the  original  search  problem  posed  in  (15)  by  using  a 
graph  that  is  designed  to  allow  agents  to  optimize  over 
a  set  containing  only  the  most  rewarding  paths.  Let 
Ke  denote  the  maximum  time  required  to  compute 


the  reward  collected  along  an  edge  e  in  the  graph. 
Performing  an  exhaustive  search  over  this  set  of 
paths  results  in  a  bound  on  the  computation  time  that 
depends  on  the  maximum  degree  of  the  graph  d,  the 
length  of  the  prediction  horizon  £,  and  the  number 
of  agents  M,  by  the  expression  M !  Ke  (:d‘ .  Note  that 
this  is  a  worst  case  computation  bound  because  in 
a  nonuniform  graph,  it  is  quite  rare  that  all  agents 
will  arrive  at  waypoints  simultaneously  and  hence 
compute  new  paths  all  at  the  same  time.  The  factor 
M\  corresponds  to  the  optimization  given  in  (22), 
in  which  an  exhaustive  search  is  performed  over 
every  possible  order  of  agents.  If  one  opts  to  use  the 
computationally  simpler,  sequential  approximation 
(23)  instead,  the  expression  becomes  MKe£d(. 
Additionally,  in  the  computation  of  the  optimal  path 
P* ,  we  can  take  advantage  of  the  property  that  the 
reward  for  the  paths  (v1,v2,v3)  and  (v,  ,v2,  v4)  is  the 
same  for  the  subpath  between  vertices  v,  and  v2.  The 
fact  that  one  only  needs  to  compute  the  reward  for 
the  segment  (v,,v2)  once,  further  reduces  the  total 
computation  required  for  this  algorithm.  This  results 
in  the  slightly  improved  bound  of  MKe  Ef=  i  dl  to 
compute  paths  for  the  team  of  agents.  It  is  clearly 
computationally  advantageous  to  keep  the  degree 
d  of  the  graph  low  and  the  prediction  horizon  l 
short.  One  can  also  reduce  the  total  number  paths 
to  evaluate  by  doing  some  reasonable  pruning  on 
the  set  of  candidate  paths  Va,  such  as  eliminating 
backtracking  and  paths  with  sharp  turns  that  the 
agents  cannot  physically  follow.  Here,  we  have  shown 
computation  results  for  an  exhaustive  search  over 
the  prediction  horizon,  but  it  should  be  noted  that 
algorithms  such  as  branch  and  bound  [5]  may  further 
reduce  complexity. 
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F.  Searching  for  Multiple  Targets 

Although  we  presented  the  CGBMPS  algorithm 
in  the  context  of  a  search  for  a  single  target,  the 
algorithm  is  easily  adapted  to  work  for  multiple 
targets.  The  main  difference  in  implementation  is  that 
instead  of  using  a  TPMM  to  model  the  pdf  of  one 
target,  one  should  model  the  combined  pdfs  of  each 
target  or  a  target  occupancy  map  as  described  in  [11]. 
Simulations  in  Section  VB  show  that  the  CGBMPS 
algorithm  performs  well  on  multiple-target  searches. 
We  leave  for  future  work  the  problem  of  determining 
multiple-target  discovery  results  analogous  to  the 
single-target  results  of  Section  IV. 

IV.  PERSISTENT  SEARCH 

In  this  section  we  show  that  the  CGBMPS 
algorithm  results  in  finite-time  target  discovery  with 
probability  one.  We  use  the  notion  of  a  persistent 
search  policy,  which  is  inspired  by  the  persistent 
pursuit  policies  discussed  in  [10]. 

A  search  policy  /.t,  as  defined  in  Section  HE  is  said 
to  be  persistent  if  there  exists  some  e  >  0  such  that 

P(pte am(*)  I  x(k),w(k),p(k),q(fc))  >  e  Vk  G  Z^° 

is  satisfied  for  the  positions  p (k)  and  sensor  tasks  q(k) 
that  are  generated  by  the  search  policy  /j.  In  other 
words,  the  probability  of  locating  the  target  at  time 
k,  given  that  it  was  not  found  previously,  is  always 
greater  than  e.  While  it  may  be  difficult  for  a  search 
policy  to  satisfy  this  property,  we  can  guarantee  the 
following  slightly  weaker  property  when  the  agents 
are  guaranteed  to  collect  positive  reward  over  some 
finite  time  horizon.  A  search  policy  is  said  to  be 
persistent  on  the  average  if  there  is  some  e  >  0  and 
some  T  gN  such  that 

r-i 

y^]P(Vic*m(k  +  0  I  x(fc  +  i)Mk  +  i),p(k  +  +  i ))  >  e 

i= 0 

\/k  e  Z^°  (25) 

is  satisfied  for  the  positions  p(k)  and  sensor  tasks  q(k) 
that  are  generated  by  the  search  policy  p.  The  time  T 
is  called  the  period  of  persistence. 

Let  Fffk)  :=  P(T*  <  k  \  p)  denote  the  distribution 
function  of  T*  given  the  search  policy  p.  We  can 
alternatively  write  this  as 


We  now  state  the  main  result  on  finite-time  target 
discovery,  which  proves  that  the  CGBMPS  algorithm 
terminates  with  probability  one. 

THEOREM  1  The  CGBMPS  algorithm  results  in  target 
discovery  infinite  time  with  probability  one  and 


'yT'rPmm  (^FQvMfOR  ) 

where 

7  is  defined  by  Assumption  1, 

T  defined  in  Lemma  1,  is  the  period  of  persistence, 
t  is  the  threshold  selected  in  step  2  of  the  graph 
construction  process,  and 

/9min  is  the  lower  bound  on  the  probability  of 
detection. 


Proof  From  Lemma  1,  we  conclude  that  the  search 
policy  generated  by  the  CGBMPS  algorithm,  using 
optimal  sensor  schedules,  satisfies 

k+T-l 

P(Vt^m(K)  I  x(«),w(«),p(K),q(K)) 

K=k 

—  T^Pmin^FOvMFOR)  i  -X/(k) 

V  K=0 


At  time  k  =  0  we  have 


T- 1 

I  x(/c),w(K),p(K),q(«;)) 

K=0 

—  T^PminO^FOv/^FOR)- 


From  the  definition  given  in  (25),  we  conclude  that 
the  search  policy  is  persistent  on  the  average,  with 
period  T  and 

e  =  1T  t  p^m(A¥OW  /  Avor). 


Theorem  1  then  follows  directly  from  Lemma  2  with  e 
as  defined  above. 


V.  SIMULATIONS 

The  CGBMPS  algorithm  was  coded  in  MATLAB 
and  simulations  were  performed  to  evaluate  the 
performance  and  scalability  of  the  search.  The 
algorithm  was  then  coded  in  C++  to  test  against 
Toyon's  SLAMEM®  simulation,  a  high-fidelity 
entity-level  simulation  environment. 

A.  MATLAB  Simulations 


k  k 

Fli{k)  =  '£,r(K)  =  l-Ha-r{Ky). 

K= 1  K= 1 

The  following  lemma  is  proved  in  [10]. 

LEMMA  2  For  a  persistent  on  the  average  search 
policy  p  with  period  T,  P(T*  <oo\  fi)  =  1,  F/t(k)  > 

1  -  (1  -  e)L*/TI  vk  e  Z^°,  and  EfiTfi  <Te  \  with  e 
given  in  (25). 


The  CGBMPS  algorithm  was  initially  tested  in 
MATLAB.  A  MATLAB  prototype  algorithm  was 
created  to  simulate  a  cooperative  search  scenario 
with  four  agents  searching  for  a  target  in  a  5  km 
by  5  km  square  region.  The  FOR  of  each  agent  was 
selected  to  be  a  circle  1  km  in  diameter  and  the  FOV 
a  circle  200  m  in  diameter.  All  agents  start  from 
the  same  initial  position,  move  at  a  constant  speed 
of  10  m/s  and  took  measurements  every  5  s.  The 
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(a)  (b) 

Fig.  4.  Snapshots  of  MATLAB  search  simulation  using  particle  filter  (a)  and  probabilistic  map  (b).  Agents  are  represented  by  solid 

circles,  and  diamonds  are  their  waypoints. 


initial  target  pdf  consisted  of  a  weighted  sum  of 
three  randomly  placed  Gaussian  distributions  with 
random  covariances,  and  the  TPMM  was  constructed 
using  a  particle  filter  as  described  in  Appendix  A. 

For  path  planning,  we  used  a  degree-3  hexagonal 
lattice  graph,  which  is  dynamically  updated  using  the 
process  described  in  Section  III,  and  the  agents  used  a 
three-step  prediction  horizon.  Fig.  4  shows  snapshots 
of  two  simulations  using  different  representations  of 
the  target  pdf.  Notice  how  the  graph  construction 
allows  the  agents  to  plan  paths  across  low-interest 
regions  to  regions  of  high  target  likelihood  that  would 
not  be  reachable  in  three  steps  on  a  uniform  lattice 
graph. 

The  first  set  of  Monte  Carlo  tests  is  designed 
to  test  the  effectiveness  of  the  cooperation  between 
agents.  Table  II  shows  the  average  discovery  times 
of  the  CGBMPS  algorithm  for  teams  of  one  to  four 
cooperating  agents.  Notice  that  two  agents  find  the 
target  in  just  over  half  the  time  it  takes  one  agent, 
and  three  agents  find  the  target  in  about  one  third 
of  the  time.  In  this  particular  scenario,  the  addition 
of  a  fourth  agent  is  nearing  the  point  of  diminishing 
returns,  but  in  general,  the  algorithm  makes  efficient 
use  of  agents.  This  efficiency  can  be  attributed  to  the 
sharing  of  target  pdf  information  between  agents. 

Table  III  compares  the  performance  of  the 
CGBMPS  algorithm  with  previously  proposed  search 
algorithms,  each  using  three  agents.  The  greedy  search 
algorithm  in  line  1  chooses  one-step  paths  on  a  static 
square  lattice  graph  that  maximize  the  probability  of 
finding  the  target.  This  is  equivalent  to  giving  each 
agent  the  choice  of  moving  straight,  left,  or  right 


TABLE  II 

Average  Time  to  Target  Discovery  for  Teams  of  1-4  Agents, 
Averaged  Over  500  Runs 


Number  of  Agents  (M)  Average  Time  to  Discovery  (s) 

1 

1401 

2 

713 

3 

486 

4 

440 

TABLE  III 

Algorithm  Performance  Comparison,  Results  Averaged  Over  500 

Runs 

Search  Method  (with  4  agents)  Average  Time  to  Discovery  (s) 

Greedy  search 

1092 

3-step  fixed  RH  Search 

846 

CGBMPS  (fixed  sensor) 

476 

CGBMPS  (joint  sensor  optimization) 

432 

at  fixed  time  intervals  and  is  similar  to  the  greedy 
pursuit  policies  proposed  in  [10].  The  results  of  line 
2  were  generated  using  a  three-step  receding-horizon 
algorithm  on  a  static  square  lattice  graph,  which  is 
similar  to  algorithms  described  in  [19],  [24],  and 
[7].  Lines  3  and  4,  when  compared  with  lines  1 
and  2,  show  that  the  CGBMPS  algorithm  performs 
significantly  better  than  the  other  algorithms  that  use 
a  fixed  time  horizon  over  static  graphs  with  edges  of 
uniform  length.  The  key  advantage  of  the  CGBMPS 
algorithm  is  its  ability  to  evaluate  longer  high-reward 
paths  than  the  fixed- step  methods,  but  with  roughly 
the  same  amount  of  computation.  Furthermore,  the 
method  of  joint  routing  and  sensor  tasking  resulted  in 
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Fig.  5.  Snapshot  of  CGBMPS  operating  in  SLAMEM  simulation  environment. 


almost  10  percent  faster  target  discovery  than  fixed 
sensor  tasking. 

B.  SLAMEM  Simulations 

We  also  implemented  the  CGBMPS  algorithm 
in  Toyon’s  high-fidelity  SLAMEM  simulation 
environment.  SLAMEM  contains  detailed  models 
for  ground  targets,  surveillance  platforms,  sensors, 
attack  aircraft,  UAVs,  data  exploitation,  multi-source 
fusion,  sensor  retasking,  and  attack  nomination. 
SLAMEM  models  road  networks,  foliage  cover,  wind, 
buildings,  and  terrain  (using  the  terrain  elevation 
data  (DTED)).  Testing  in  SLAMEM  exercised 
the  CGBMPS  algorithm  against  detailed,  realistic 
scenarios  with  multiple  UAVs  and  targets,  and  helped 
us  prepare  the  algorithm  for  flight  testing. 

SLAMEM  has  been  used  as  the  center  for  our 
system  development  on  the  hardware  implementation 
of  the  CGBMPS  algorithm.  Our  control  and  data 
fusion  algorithms  work  with  SLAMEM  sensor  and 
communications  models  to  emulate  a  complete  UAV 
autonomous  control  system.  Video  sensor  exploitation 
models  produce  measurements  to  determine  the 
presence  of  objects  in  a  region,  and  communications 
models  provide  realistic  data  link  constraints  on 
the  network  to  determine  the  feasibility  of  our 
decentralized  processing  and  control  architecture. 
SLAMEM  asset  models  accurately  represent  the 
Unicorn  UAV  platforms  and  other  standard  military 
platforms  in  the  simulation  and  execute  commands 


from  our  control  algorithms.  GVS™  supplies  realistic 
targets  of  interest  to  the  simulation. 

In  the  SLAMEM  simulations,  one  or  more  UAVs 
were  autonomously  controlled  by  the  CGBMPS 
algorithm  to  locate  one  or  more  targets  in  a  specified 
area  of  interest  (AOI).  The  parameters  for  the 
simulations  match  closely  with  the  parameters  of  the 
hardware  tests  (cf.  Section  VI).  The  search  region 
(AOI)  spanned  10.5  km2  of  gently  sloping  terrain. 

The  UAV  airspeed  was  set  to  approximately  50  km/hr, 
and  the  flight  altitude  was  fixed  to  100  m-120  m 
above  ground  level.  At  this  altitude,  the  sensor  FOV 
footprint  on  the  ground  covers  approximately  0.1  km2, 
meaning  that  the  UAVs  see  only  1%  of  the  search 
region  with  each  camera  view.  A  screenshot  of  the 
CGBMPS  SLAMEM  simulation  appears  in  Fig.  5. 

Simulation  results  measured  T* ,  the  amount  of 
time  it  took  the  UAV  or  UAV  team  to  detect  the 
target(s).  In  each  trial,  T*  results  were  averaged  over 
100  Monte  Carlo  trials.  Several  variations  of  UAV 
teams,  ground  targets,  and  search  algorithms  were 
simulated.  In  these  tests,  because  each  Monte  Carlo 
run  required  a  large  computation  effort,  we  only 
compared  CGBMPS  with  algorithms  that  we  knew 
would  perform  very  well  for  the  specific  model  used 
for  the  target  motion. 

1)  Random  Search:  As  a  UAV  approaches  a 
waypoint,  the  random  search  algorithm  selects  the 
next  UAV  waypoint  at  random  within  the  AOI.  This 
algorithm  very  often  tends  to  send  the  UAV  through 
the  middle  of  the  AOI.  By  matching  the  random  target 
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Fig.  6.  UAV  path  for  Raster  search. 


TABLE  IV 

Simulation  Results  for  Search  Algorithms  Running  in  SLAMEM 


Search  Algorithm 

T*  =  Time  to  Target 
Detection  in  Minutes 

Random 

21.1 

Raster  scan 

25.7 

CGBMPS,  Dynamic  graph,  1-step 

24.7 

CGBMPS,  Dynamic  graph,  3-step 

18.7 

CGBMPS,  Dynamic  graph,  5-step 

18.7 

Note'.  Numeric  entries  show  T *  in  minutes,  where  T*  =  the 
time  required  to  detect  a  single  target,  averaged  over  100  Monte 
Carlo  trials. 


motion  with  the  motion  of  the  random  searchers, 
we  obtained  an  algorithm  that  is  quite  effective.  In 
practice  such  matching  could  not  be  done  for  real 
targets. 

2)  Raster  Search :  The  Raster  search  algorithm 
scans  the  entire  AOI  side-to-side,  bottom-to-top,  as  in 
Fig.  6.  This  is  actually  the  optimal  search  algorithm 
to  discover  a  stationary  target  when  the  initial  target 
probability  distribution  is  uniform  over  the  search 
region. 

3)  CGBMPS :  The  algorithm  described  in  this 
paper.  This  algorithm  was  tested  with  one,  three,  and 
five-step  prediction  horizons. 

These  results  appear  in  Table  IV.  It  is  important 
to  note  the  search  results  shown  in  Tables  IV  and 
V  depend  on  the  target  motion  models.  Targets 
moving  randomly  along  the  road  network  tend  to 
spend  more  time  in  the  middle  of  the  AOI,  making 
them  more  easily  detected  by  the  random  search.  A 
different  target  motion  model  that  kept  targets  more 
stationary,  or  moving  near  the  perimeter  of  the  AOI, 
would  give  more  advantage  to  the  Raster  search. 

Our  results  are  very  encouraging  because  they  show 
that  the  CGBMPS  algorithm  compares  favorably 
with  algorithms  that  explore  specific  motion  models 
for  the  target.  In  contrast,  the  CGBMPS  algorithm 
will  perform  well  with  nearly  any  target  motion 
model  because  of  its  versatile  target  state  estimation 
framework  (Section  IIC). 

Table  V  shows  the  results  for  multiple  UAVs  and 
multiple  targets.  One  can  observe  that  each  subsequent 
target  takes  longer  to  find,  and  that  the  team  of  three 
agents  cooperates  effectively.  While,  in  this  case,  three 


TABLE  V 

This  Table  Shows  SLAMEM  Simulation  Results  for  the  CGBMPS 
Algorithm  with  Varying  Numbers  of  UAVs  and  Targets,  Averaged 
Over  100  Monte  Carlo  Trials 


#  UAVs 

#  Targets 

Time  to  (1st,  2nd,  3rd)  Target 
Detection  in  Minutes 

3 

1 

7.5 

1 

3 

7.0,  26.2,  55.2 

3 

3 

1.8,  5.7,  14.1 

UAV s  found  the  targets  on  average  more  than  three 
times  faster  than  a  single  UAV,  this  can  be  attributed 
to  advantageous  starting  positions  for  the  second  and 
third  UAVs. 

VI.  HARDWARE  IMPLEMENTATION 

The  CGBMPS  algorithm  has  been  successfully 
field-tested  using  both  Unicorn  [27]  and  Raven 
[20]  UAV  platforms.  This  section  describes  the 
hardware-software  system  setup  and  implementation 
used  for  testing  with  Unicorns,  Toy  on's  UAV  test 
platform.  We  also  present  some  results  from  the  field 
tests. 

To  facilitate  hardware-in-the-loop  (HIL)  testing 
of  the  search  algorithm,  the  SLAMEM  simulation 
environment  was  modified  to  allow  integration  of  real 
UAVs  and  ground  vehicles  with  other  simulated  assets 
and  vehicles,  all  as  part  of  one  complex  scenario  [3], 
SLAMEM  communicates  with  Virtual  Cockpit™  [28], 
the  UAV  ground  control  software,  to  receive  position 
updates  for  the  real  UAVs  and  then  render  them  in 
simulation.  SLAMEM  can  also  receive  GPS  updates 
from  a  real  target  and  render  this  target  in  simulation. 
In  this  way,  SLAMEM  can  act  as  the  HIL  display 
and  control  interface  for  the  real  UAVs  in  the  air  and 
real  vehicles  on  the  ground.  Meanwhile,  SLAMEM 
can  also  simulate  additional  assets  and  targets.  The 
real  entities  and  simulated  entities  may  all  interact 
within  SLAMEM,  facilitating  mixed  HIL + simulation 
scenarios,  as  shown  in  Fig.  7. 

A.  UAV  Platform 

Our  primary  test  platform  is  the  Unicorn  UAV 
(Fig.  8).  This  platform  is  based  on  the  Unicorn 
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Fig.  7.  Annotated  laptop  screen  capture  showing  SLAMEM  and  Virtual  Cockpit  running  on  our  ground  control  station.  Virtual  Cockpit 
(on  left)  displays  real-time  UAV  statistics  and  prior  flight  paths.  SLAMEM  (on  right)  shows  real  and  simulated  UAVs,  real  and 

simulated  targets,  roads,  and  other  control  parameters. 


Fig.  8.  Unicorn  wing  outfitted  with  Procerus®  Kestrel™  autopilot  system  and  gimbaled  video  camera. 


expanded  polypropylene  (EPP)  foam  wing  [27].  This 
is  a  radio  controlled,  60"  EPP  foam  wing  powered 
by  four  lithium-polymer  battery  packs  driving  an 
electric  motor  attached  to  a  propeller.  This  power-train 
propels  the  UAV  at  about  50  km/hr  (airspeed).  On 
a  full  charge  the  batteries  will  provide  roughly  1  hr 
of  flight  time.  Our  Unicorn  UAVs  also  house  the 
Kestral™  Autopilot  control  board,  a  gimbaled  video 


camera,  two  radio  modems,  batteries,  and  wiring 
to  connect  the  various  components.  The  UAV  uses 
separate  channels  for  the  telemetry  data  and  the  video 
stream,  thus  necessitating  two  radio  modems. 

B.  Gimbaled  Video  Camera 

A  small  480  line,  5  V,  CCD,  NTSC  video  camera 
is  attached  to  a  two  degree  of  freedom  gimbal. 
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Fig.  9.  Miniature  video  camera  and  gimbal  mechanism. 


mounted  on  the  bottom  of  each  UAV.  The  camera 
and  gimbal  can  be  seen  in  Fig.  9.  The  elevation 
has  unconstrained  movement  in  the  range  —  90°-0°. 
The  azimuth  is  limited  to  movement  in  the  range 
— 135°-45°  (with  0°  looking  straight  out  the  nose  of 
the  UAV).  The  camera  has  a  fixed  zoom  that  can  be 
changed  by  replacing  the  camera  lens. 

The  UAVs  are  typically  flown  at  100  m  altitude, 
above  ground  level  (AGL).  Intersecting  the  camera 
FOV  with  the  Earth  from  this  altitude  yields  a  FOV 
footprint  on  the  ground  covering  about  0.1  km2. 
Analog  video  from  the  camera  is  transmitted 
wirelessly  to  a  receiver  on  the  ground. 

C.  Kestral  Autopilot 

Low-level  control  logic  is  handled  by  the  Kestrel 
Autopilot  [12].  Navigation  controls  for  the  UAV 
are  communicated  to  the  Autopilot  in  the  form  of 
waypoint  commands:  geodetic  locations  above  the 
ground  in  latitude/longitude/altitude  (above  the  geoid). 
Gimbal  targeting  commands  can  be  sent  either  in 
azimuth/elevation  (relative  to  the  platform),  or  as  a 
latitude/longitude/altitude  in  an  Earth-based  coordinate 
frame.  The  system  also  allows  a  limited  degree  of 
manual  flight  and  gimbal  control  using  a  gamepad. 

The  CGBMPS  algorithm  controls  the  UAV  by 
selecting  waypoints  and  sending  them  to  the  autopilot. 
These  waypoints  come  from  nodes  in  the  search  graph 
G  (Section  IIIA),  and  are  selected  by  the  optimization 
in  (22)  or  (23). 

D.  Field  Test  Results 

A  team  of  two  Unicorns  were  autonomously 
controlled  by  the  CGBMPS  algorithm  to  cooperatively 
discover  and  locate  targets  in  a  10.5  km2  region.  The 
UAV s  search  using  dynamically  updated  probabilistic 


maps  and  particle  filters.  Each  UAV  controller  runs  a 
separate  instance  of  the  CGBMPS  algorithm,  and  each 
instance  used  estimator-based  decentralized  control  to 
estimate  current  states  and  future  search  actions  for 
the  other  UAV.  Cooperation  occurs  by  UAVs  sharing 
their  states  and  predicting  each  other’s  sensor  actions. 

One  of  the  UAV s  was  real,  while  the  other  was 
simulated  in  SLAMEM.  The  demo  included 

1  real  target  (a  pickup  truck  carrying  a  GPS  unit  to 
render  it  in  SLAMEM), 

2  simulated  targets, 

1  real  Unicorn  UAV,  and 

1  simulated  Unicorn  UAV. 

UAV  airspeed  was  approximately  50  km/hr,  flying  at 
100  m  AGL.  The  sensor  FOV  footprint  on  the  ground 
covers  approximately  0.1  km2,  meaning  that  the  UAVs 
see  only  1%  of  the  search  region  with  each  camera 
view.  All  target  motions  were  random,  including  the 
motions  of  the  real  target.  The  first  target  detection 
occurred  by  the  real  UAV  at  15:54  into  the  trial;  the 
second  target  detection  was  by  a  simulated  UAV  at 
22:17. 

Figs.  10  and  11  show  screenshots  that  were 
generated  using  recorded  data  from  the  field  tests  of 
the  CGBMPS  algorithm. 

VII.  CONCLUSIONS  AND  FUTURE  WORK 

We  have  presented  a  cooperative  search  algorithm 
that  uses  receding-horizon  optimization  on  a 
dynamically  updated  graph  and  achieves  finite-time 
target  discovery  with  probability  one.  The  novel 
contributions  of  this  algorithm  are  1)  optimizing  on 
a  dynamic  graph  updated  based  on  changing  target 
probability  distribution,  2)  using  a  waypoint-based 
prediction  horizon  allowing  for  comparison  between 
paths  of  various  lengths,  and  3)  incorporating  the  use 
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Fig.  10.  Two  Unicorn  UAVs  begin  search  of  AOI  (white  rectangle)  for  three  targets  (gray  rectangles).  Curved  UAV  search  paths  are 
shown  as  black  lines  connecting  vertices  in  the  search  graph  (not  displayed).  UAV  sensor  task  footprints  appear  as  quadrilaterals. 
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Fig.  11.  Once  target  is  detected  and  track  on  that  target  has  been  instantiated,  particle  filter  is  used  to  update  and  predict  track  state. 
Particle  cloud  appears  in  figure  as  cluster  of  dots  near  road.  UAV  has  transitioned  from  search  to  bird-dogging,  and  SLAMEM  shows 

it's  future  path  as  loop  around  predicted  position  of  target. 
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of  gimbaled  sensors  whose  orientations  can  be  jointly 
optimized  with  the  paths  of  the  agents.  Simulations 
showed  that  the  waypoint-based  prediction  horizon 
with  a  strategic  placement  of  graph  vertices 
significantly  improved  search  algorithm  performance 
over  traditional  implementations  having  a  fixed  time 
horizon  on  graphs  with  edges  of  uniform  length. 

Using  joint  routing  and  sensor  optimization  provides 
an  additional  increase  in  performance.  Furthermore, 
we  have  successfully  tested  the  CGBMPS  algorithm 
on  a  physical  system  consisting  of  two  UAVs  with 
gimbal-mounted  cameras. 

The  decentralized  version  of  the  algorithm, 
in  which  each  agent  maintains  estimates  on  the 
positions  and  sensor  measurements  of  its  teammates 
and  updates  these  estimates  whenever  it  receives 
information  from  other  agents,  warrants  further  study. 
However,  the  communication  bandwidth  requirements 
of  the  centralized  algorithm  we  have  presented  are 
relatively  low.  The  central  computer  only  needs  to 
send  waypoints  (latitude,longitude,altitude)  to  the 
UAV s,  amounting  to  no  more  than  a  few  hundred 
bytes.  The  only  data  returned  from  the  UAVs  are 
the  positive  detections,  since  negative  detections  are 
assumed  whenever  no  data  is  received. 

The  frequency  with  which  one  must  upload 
these  waypoint  packets  depends  on  the  coarseness 
of  the  search  graph.  Our  graph  construction  process 
specifies  a  vertex  spacing  approximately  twice  the 
diameter  of  the  sensor  FOR.  For  a  Raven  UAV,  this 
distance  is  about  400  m.  Flying  at  a  nominal  speed 
of  about  15  m/s,  this  leaves  about  26  s  between 
waypoints.  Before  the  UAV  reaches  each  waypoint, 
a  new  waypoint  or  multi-waypoint  path  should  be 
uploaded.  So  even  if  the  search  algorithm  determined 
a  complete  flight  path  change  at  each  recalculation, 
the  total  communication  amounts  to  a  few  hundred 
bytes  every  26  s. 

Moreover,  for  UAV s,  low-power  long-range 
communication  is  available  because  one  generally  has 
line-of-sight.  For  example,  in  another  project  [13], 
we  are  using  relatively  weak  radios  that  still  have 
60  mi  range — this  is  possible  because  we  have  of 
line-of-sight  from  UAV  to  UAV  and  because  in  this 
work  we  never  need  to  transmit  large  volumes  of  data. 
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APPENDIX.  TARGET  PROBABILITY  MIXTURE  MODEL 
EXAMPLES 

A.  Example  1:  Particle  Filter 

Particle  filtering  is  a  sequential  Monte  Carlo-based 
method  for  state  estimation  that  is  especially  well 
suited  to  systems  with  nonlinear  dynamics  and 


non-Gaussian  probability  distributions.  It  involves 
modeling  a  system  with  a  large  number  of  dynamic 
“particles”  whose  states  evolve  according  to  some 
stochastic  model.  Weights  are  typically  updated 
upon  the  arrival  of  new  measurements  and  are  then 
normalized  so  that  the  total  weight  of  the  particles 
is  equal  to  one.  For  a  particle  filter  that  estimates 
the  location  of  a  mobile  target,  the  probability  that 
the  target  is  located  within  some  region  of  the  state 
space  can  be  estimated  by  summing  the  weights 
of  the  particles  lying  in  that  region.  Hence  regions 
in  which  the  particles  are  densely  spaced  represent 
areas  with  high  target  likelihood,  while  low  density 
regions  indicate  low  target  likelihood.  Following  is  a 
generalized  model  of  a  simple  particle  filter. 

Let  x(k)  =  [xl(k),x2(k), . . .  ,xn(k)]  represent  the 
positions  of  the  particles  whose  corresponding  weights 
are  given  by  w (k)  =  [wl(k),w2(k),...,wn{k)}.  The 
system  dynamics  can  be  expressed  as  follows: 


POSITION  UPDATE : 

x,(k  +  1)  = 

fx(Xi(k),Vi(k)) 

(26) 

WEIGHT  UPDATE : 

w^k  +  1)  = 

M 

:  Wt{k) 

1 

x  ( 1  D(pa(k),qa(k),Xi(.k)y) 

(27) 

RENORMALIZATION : 

yv(k  +  1 )  = 

w  (k  +  1)  = — J— 

:  +  l) 

(28) 

where  w (k)  :=  6w ,  (k), . 

■■,wn(k))  is 

an  intermediate 

vector  of  unnormalized  target  weights  and  \(k) :  = 

(V|  (k), . . . , vn(k))  is  a  process  noise  sequence.  The 
function  fx  is  used  to  propagate  the  particle  positions 
according  to  a  model  of  the  target  dynamics, 
randomized  with  \(k).  For  our  purposes,  the  noise 
sequence  v(k)  is  known  by  all  agents.  In  this  case,  the 
weight  update  equation  (27)  uses  the  fact  that  the  delta 
function  defined  in  (3)  takes  the  form  of  a  Dirac  delta 
function. 

We  omit  from  this  section  a  discussion  of  specific 
particle  filter  implementations  in  favor  of  a  more 
general  model.  However,  much  of  the  power  of 
particle  filter  estimation  lies  in  the  refined  sampling 
and  resampling  techniques  used  in  the  more  advanced 
models.  We  refer  the  reader  to  [1]  for  a  survey  of 
several  particle  filtering  methods. 

B.  Example  2:  Grid-Based  Probabilistic  Map 

A  grid-based  probabilistic  map  is  another  method 
for  estimating  the  states  of  an  uncertain  system.  It 
involves  representing  a  system’s  state-space  by  a  grid 
of  cells,  each  of  which  has  a  weight  corresponding  to 
the  probability  of  the  target  being  located  inside  that 
cell.  The  cell  weights  are  updated  based  on  incoming 
measurements  and  predicted  future  states.  Following  is 
a  generalized  implementation  of  a  probabilistic  map. 
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Let  x  =  \xl,x2,.-.,xlt\  represent  the  static  center  [9] 

points  of  the  cells,  whose  corresponding  weights  are 
given  by  w (k)  =  [vv, (k),w2(k), . . ,,wn(kj\.  The  system 
dynamics  can  be  expressed  by 


WEIGHT  UPDATE : 


DIFFUSION  : 

RENORMALIZATION  : 


+  1)  =  Wf(k)  J  6(xt-x) 


x  Il(1  “  D{pa(k),qa{lc),x{k)))dx 

a=  1 

(29) 

w  (k  +  1)  =  fw(y/(k  +  1))  (30) 


w(fc  +  1)  =  w  (k  +  1)  „  »  — — 


(31) 

where  w (k)  and  w (k)  are  intermediate  vectors  of 
target  weights.  For  this  grid-based  method,  the  delta 
function  S(-)  is  a  two-dimensional  rectangular  impulse 
function  the  size  of  one  grid  cell.  The  function  fw  can 
be  used  to  diffuse  target  weights  between  adjacent 
cells  according  to  a  transition  probability  matrix.  Note 
that  since  the  xt  are  fixed  in  a  grid-based  probabilistic 
map,  there  is  no  position  update.  See  [11]  for  a  more 
detailed  description. 
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